A framework for integrating inferred movement behavior into disease risk models

Movement behavior is an important contributor to habitat selection and its incorporation in disease risk models has been somewhat neglected. The habitat preferences of host individuals affect their probability of exposure to pathogens. If preference behavior can be incorporated in ecological niche models (ENMs) when data on pathogen distributions are available, then variation in such behavior may dramatically impact exposure risk. Here we use data from the anthrax endemic system of Etosha National Park, Namibia, to demonstrate how integrating inferred movement behavior alters the construction of disease risk maps. We used a Maximum Entropy (MaxEnt) model that associated soil, bioclimatic, and vegetation variables with the best available pathogen presence data collected at anthrax carcass sites to map areas of most likely Bacillus anthracis (the causative bacterium of anthrax) persistence. We then used a hidden Markov model (HMM) to distinguish foraging and non-foraging behavioral states along the movement tracks of nine zebra (Equus quagga) during the 2009 and 2010 anthrax seasons. The resulting tracks, decomposed on the basis of the inferred behavioral state, formed the basis of step-selection functions (SSFs) that used the MaxEnt output as a potential predictor variable. Our analyses revealed different risks of exposure during different zebra behavioral states, which were obscured when the full movement tracks were analyzed without consideration of the underlying behavioral states of individuals. Pathogen (or vector) distribution models may be misleading with regard to the actual risk faced by host animal populations when specific behavioral states are not explicitly accounted for in selection analyses. To more accurately evaluate exposure risk, especially in the case of environmentally transmitted pathogens, selection functions could be built for each identified behavioral state and then used to assess the comparative exposure risk across relevant states. The scale of data collection and analysis, however, introduces complexities and limitations for consideration when interpreting results.

These measures were calculated for the four cloudless images available during the 2009 anthrax season (March 22, April 23, May 9, and May 25) and the four cloudless images available during the 2010 anthrax season (February 5, April 10, May 12, and May 28). A single mean layer was then calculated for the Greenness and Wetness metrics in each year. All of these layers were obtained at a resolution of 30 meters.
The mean Wetness and mean NDVI measures were highly (negatively) correlated at the landscape scale in the region through which the zebra moved (R = −0.95 for the 2009 season and R = −0.94 for the 2010 season), and the latter was eliminated from subsequent analyses. The correlations between these mean Greenness and mean Wetness at the landscape scale were relatively low (R = −0.59 in 2009 and R = −0.47 in 2010), so both were maintained as potential predictors.
To account for the potential impact of human development in the area, primary roads (i.e., those made of tar or gravel) were mapped and recast in the form of a road density layer. The road density layer was created in ArcMap 10.3.1 by calculating the length of road (in meters) per unit area (square meters) in each 30 meter raster cell. This layer exhibited low covariance with the other two that were maintained in the predictor variable set, resulting in a total of three continuous predictor variables. Other frequently used continuous variables, such as elevation, slope, and aspect, were eliminated a priori due to the natural homogeneity of the study site with regard to those variables. Though potentially important, particularly in terms of aggregating anthrax spores [2], the minute differences in elevation across Etosha National Park were not detectable at the resolution at which data were available.

Selection of environmental prediction layers
For all three categories (soil characteristics, bioclimatic variables, and vegetation indices), sets of variables were initially selected based on associations with B. anthracis survival in other modeling studies. The soil component was summarized using soil pH in H 2 O (pH), organic carbon content (OC), and cation exchange capacity (CEC). All three of these layers were obtained from the SoilGrids database at a resolution of 250m. The five bioclimatic variables consisted of mean annual temperature (bio1), mean temperature range (bio7), annual precipitation (bio12), precipitation of the wettest month (bio13) and precipitation of the driest month (bio14). The latter variable was removed because it was uninformative in the region of interest (i.e., every cell had the same value). The bioclimatic variable layers were obtained from the WorldClim database at a resolution of 30 arc-seconds (≈1 km). The potential vegetation indices were calculated based on the Landsat 7 NDVI 8-Day Composite layers, which were obtained at a resolution of 30m. Due to the disparity in resolution, the soil characteristic and bioclimatic variables were upsampled using bilinear interpolation. This enabled compatibility between the low resolution bioclimatic data and the higher resolution vegetation data. Because spore persistence may exhibit dependence on local vegetation patterns during the entire period of sampling (i.e., the year of deposition and the two subsequent years), the vegetation metrics were calculated for the three year periods between initial discovery and final sampling. The vegetation indices that served as potential predictors were the mean NDVI (mean ndvi), maximum NDVI (max ndvi), minimum NDVI (min ndvi), and the range of NDVI over the period of observation (range ndvi). To minimize the covariance between predictor variables used in the model for each year, a Pearson correlation matrix was calculated, and any pairs of variables whose coefficient of correlation was > 0.8 were reduced to a single variable as explained below.

MaxEnt modeling details
To parameterize a MaxEnt model, the background distribution of predictors must be considered. This requires a sampling protocol in which points are randomly dispersed throughout the region of interest and the values of the environmental predictors at those points (i.e., pseudo-absence points) are recorded for comparison with the presence points. In this case, 500 background points were selected. Due to the association of carcasses with a particular deposition year, these background points were divided into three groups in proportion to the number of observed carcasses in each of the three years (2010,2011,2012). The result was a set of pseudo-absence points consisting of 312, 50, and 138 points associated with 2010, 2011, and 2012 seasons, respectively. The predictor values were extracted for each presence and pseudo-absence point according to its deposition year. An initial MaxEnt model was run on these data and the full candidate predictor set using the implementation in the dismo package (version 1.1-4; [3]) in R (version 3.4.3; [4]). Following an investigation of the variable contributions to this full model (generated as a standard output of the maxent function), variables exhibiting covariance with another predictor were culled such that the variable in the pair with the higher contribution to the MaxEnt model was maintained and its counterpart eliminated. Finally, another MaxEnt model was run on the reduced predictor variable set.

Carcass dataset robustness
To verify that the reduced carcass dataset was sufficient for the derivation of the anthrax risk layer, two alternative approaches were applied. The first was a simple calculation of the area under the receiver operating characteristic (ROC) curve using the comprehensive set of observed anthrax carcasses from 2009 and 2010. The total number of test points in this set was 237 (69 from 2009 and 168 from 2010). The second approach involved the use of Warren's I metric [5]. This metric, which ranges in value from 0 (indicating no overlap) to 1 (complete overlap), aims to measure niche equivalency using the output raster layers of species distribution models. Here, 100 separate niche models were created using random selections of 40 points from the full carcass dataset (including both the set of 40 sites where anthrax concentrations were measured in subsequent years and the set of 237 observed carcasses from 2009 an 2010). In addition, 500 pseudo-absence points were randomly assigned from a set of 1149. The resulting MaxEnt model based on each random subset of the presence and pseudo-absence data was projected onto the predictor layers from the 2009 and 2010 seasons, and the suitability maps were compared to the final maps derived based on only the 40 presence points at which the concentration of anthrax was measured in subsequent years. The Warren's I metric was then calculated for both the 2009 and 2010 season to determine the similarity and judge the efficacy of using the reduced dataset on B. anthrasis persistence rather than carcass presence.